-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
BUG: resampling with NaT in TimedeltaIndex (#13223) #14649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Current coverage is 85.28% (diff: 100%)@@ master #14649 diff @@
==========================================
Files 140 140
Lines 50693 50698 +5
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 43234 43240 +6
+ Misses 7459 7458 -1
Partials 0 0
|
Codecov Report
@@ Coverage Diff @@
## master #14649 +/- ##
==========================================
+ Coverage 86.33% 90.97% +4.64%
==========================================
Files 139 145 +6
Lines 51149 49481 -1668
==========================================
+ Hits 44157 45014 +857
+ Misses 6992 4467 -2525
Continue to review full report at Codecov.
|
pandas/tseries/resample.py
Outdated
binner = binner.insert(0, tslib.NaT) | ||
labels = labels.insert(0, tslib.NaT) | ||
|
||
n_NaT = sum([ax_i is tslib.NaT for ax_i in ax]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
n_NaT = ax._isnan.sum()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change. Thanks
@@ -1204,6 +1206,13 @@ def _get_time_delta_bins(self, ax): | |||
end_stamps = labels + 1 | |||
bins = ax.searchsorted(end_stamps, side='left') | |||
|
|||
if ax.hasnans: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though we actually need to ignore the nans, doesn't this actually create a NaT group? this is inconcistent with grouping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
NaT group will be ignored in aggregation. It is handled the same way as in function _get_time_bins
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here on what you are doing
@@ -970,6 +970,15 @@ def test_resample_timedelta_idempotency(self): | |||
expected = series | |||
assert_series_equal(result, expected) | |||
|
|||
def test_resample_timedelta_missing_values(self): | |||
# GH 13223 | |||
index = pd.to_timedelta(['0s', pd.NaT, '2s']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
test with an all NaT index
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
@@ -1189,13 +1190,18 @@ def _get_time_delta_bins(self, ax): | |||
raise TypeError('axis must be a TimedeltaIndex, but got ' | |||
'an instance of %r' % type(ax).__name__) | |||
|
|||
if len(ax) > 0 and all(ax._isnan): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would give a better error message here. Is this consistent with how we handle all-nan grouping?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All-nan grouping doesn't seem to be handled elsewhere. Any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we currently ignore all nan groups, that's why I think it is there, so this is consistent, maybe a comment is worth it here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so this is all data is nan. Hmm, I would just give a better error message then.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ax.hasnans
can you rebase. |
# all NaT | ||
index = pd.to_timedelta([pd.NaT, pd.NaT, pd.NaT]) | ||
series = pd.Series([2, 3, 5], index=index) | ||
self.assertRaises(DataError, series.resample('1s').mean) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably the right idea (e.g. raising a nice message),
e.g. we do this for all-nan groups (which is unfriendly)
In [8]: s = Series([1,2,3],[np.nan,np.nan,np.nan])
In [9]: s.groupby(s.index).sum()
TypeError: unhashable type: 'Float64Index'
d24693c
to
6378e38
Compare
Rebased |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good. some doc fixups
pls add a whatsnew entry in bug fixes (0.20.0).
ping on green.
@@ -1189,13 +1190,18 @@ def _get_time_delta_bins(self, ax): | |||
raise TypeError('axis must be a TimedeltaIndex, but got ' | |||
'an instance of %r' % type(ax).__name__) | |||
|
|||
if len(ax) > 0 and all(ax._isnan): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use ax.hasnans
@@ -1204,6 +1206,13 @@ def _get_time_delta_bins(self, ax): | |||
end_stamps = labels + 1 | |||
bins = ax.searchsorted(end_stamps, side='left') | |||
|
|||
if ax.hasnans: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment here on what you are doing
if not len(ax): | ||
binner = labels = TimedeltaIndex( | ||
data=[], freq=self.freq, name=ax.name) | ||
return binner, [], labels | ||
|
||
start = ax[0] | ||
end = ax[-1] | ||
# Addresses GH #13223 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a comment on what you are doing here (and why not just selecting start/end)
In `io/parsers/_try_convert_dates()` when selecting columns based on a column index from a set of columns with multi- level names, the column `name` was converted to a string. This appears to be a bug since the `name` was a tuple before the conversion. This causes problems downstream when there is an attempt to use this name to lookup a column, and that lookup fails because the desired column is keyed from the tuple, not its string representation closes pandas-dev#15376 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15378 from stephenrauch/fix_read_csv_merge_datetime and squashes the following commits: 030f5ec [Stephen Rauch] BUG: Parse two date columns broken in read_csv with multiple headers
closes pandas-dev#15426 Author: Stephen Rauch <[email protected]> Closes pandas-dev#15433 from stephenrauch/tz-lost-in-groupby-agg and squashes the following commits: 64a84ca [Stephen Rauch] BUG: GH15426 timezone lost in groupby-agg with cython functions
column names in python 2. closes pandas-dev#11879 closes pandas-dev#13462
in assert_frame_equal, if check_like, the former code reindex_like before shape comparison. for example: if left.shape=(2,2), right.shpae=(2.0), after reindex_like, left.shape=(2,0),right.shape=(2,0),then the shape comparison will not find out that the two dataframes are different. For that, the assert_frame_equal will not raise assertion errors. But in fact it should raise. Author: jojomdt <[email protected]> Closes pandas-dev#15496 from jojomdt/master and squashes the following commits: 7b3437b [jojomdt] fix test_frame_equal_message error 0340b5c [jojomdt] change check_like description c03e0af [jojomdt] add test for TestAssertFrameEqual 470dbaa [jojomdt] combine row and column shape comparison ce7bd74 [jojomdt] reindex_like after shape comparison
…_layout() (pandas-dev#15515) * Add unit test for pandas-dev#9351 * Tweaks. * add _check_plot_works; rm aux method * Add whatsnew entry.
closes pandas-dev#15347 Author: Jeff Reback <[email protected]> Closes pandas-dev#15484 from jreback/gbq and squashes the following commits: 0fd8d06 [Jeff Reback] wip 3222de1 [Jeff Reback] CLN: remove pandas/io/gbq.py and tests and replace with pandas-gbq
The transform() operation needs to return a like-indexed. To facilitate this, transform starts with a copy of the original series. Then, after the computation for each group, sets the appropriate elements of the copied series equal to the result. At that point is does a type comparison, and discovers that the timedelta is not cast- able to a datetime. closes pandas-dev#10972 Author: Jeff Reback <[email protected]> Author: Stephen Rauch <[email protected]> Closes pandas-dev#15430 from stephenrauch/group-by-transform-timedelta-from-datetime and squashes the following commits: c3b0dd0 [Jeff Reback] PEP fix 2f48549 [Jeff Reback] fixup slow transforms cc43503 [Stephen Rauch] BUG: GH15429 transform result of timedelta from datetime
…11444, pandas-dev#13046 make sure .size includes the name of the grouped
…-dev#15523) * DOC: Update contributing for test_fast, fix doc Windows build * add pip install for xdist
…s-dev#15879) DOC: remove vbench instructions from contributing.rst
* DOC: update contributing.rst for ci * typos & auto-cancel links * make it a note * add back accid deleted section
When cleaning `na_values` during initialization of `TextFileReader`, we return a `list` whenever we specify that `na_values` should be empty. However, the rest of the code expects a `set`. Closes pandas-dev#15835. Author: gfyoung <[email protected]> Closes pandas-dev#15881 from gfyoung/keep-default-na-excel and squashes the following commits: 0bb6f64 [gfyoung] BUG: Patch handling no NA values in TextFileReader
closes pandas-dev#14800 Author: Jeff Reback <[email protected]> Closes pandas-dev#15541 from jreback/exceptions and squashes the following commits: e5fbdc8 [Jeff Reback] give nicer deprecation / message on infer_dtype moving ab4525b [Jeff Reback] typo on pandas.errors in whatsnew d636ef7 [Jeff Reback] document removed exceptions 3dc4b9a [Jeff Reback] more docs for exceptions 2bb1fbd [Jeff Reback] remove AmbiguousIndexError, completely unused 5754630 [Jeff Reback] fix doc-string 35d225f [Jeff Reback] more examples e91901d [Jeff Reback] DOC: better docs on infer_type 7e8432d [Jeff Reback] remove need for PandasError sub-class 92b2fdc [Jeff Reback] corrections 991fbb4 [Jeff Reback] API: expose pandas.errors eec40cd [Jeff Reback] add pandas.api.lib add infer_dtype to pandas.api.lib
xref pandas-dev#12640 xref pandas-dev#14876 Author: Aleksey Bilogur <[email protected]> Closes pandas-dev#15521 from ResidentMario/12640 and squashes the following commits: 1657246 [Aleksey Bilogur] two doc changes 28a38f2 [Aleksey Bilogur] tweak whatsnew entry. 5f306a9 [Aleksey Bilogur] +whatsnew ff895fe [Aleksey Bilogur] Add tests, update docs. 11f3fe4 [Aleksey Bilogur] rm stray debug. 3cbbed5 [Aleksey Bilogur] Melt docstring. d54dc2f [Aleksey Bilogur] +pd.DataFrame.melt.
… like closes pandas-dev#15869 Author: Jeff Reback <[email protected]> Closes pandas-dev#15892 from jreback/construct and squashes the following commits: 6bf2148 [Jeff Reback] fix perf 7fcd4e5 [Jeff Reback] BUG: Bug in DataFrame construction with nulls and datetimes in a list-like
* Citing source in README file For GH users who strictly or heavily use the web-view instead of a local Git, having a direct link is handy, as it does not require downloading the PDF _if_ the user wanted to go to the source of it directly. It's an alternative that allows those interested in more uploads similar to this PDF from the same author(s). * jorisvandenbossche's feedback I re-read the PDF and made sure the wording reflected the content presented. I also changed the source-citing so that is more friendly for .TXT files instead of Markdown or unspecified. * Update README.txt * English enhancement Improved sentence structure for English speakers.
xref pandas-dev#15299 Author: Jeff Reback <[email protected]> Closes pandas-dev#15902 from jreback/series_n and squashes the following commits: 657eac8 [Jeff Reback] TST: better testing of Series.nlargest/nsmallest
1) Allows for more uniform handling of invalid file buffers to our `read_*` functions. 2) Adds a ton of new documentation to `inference.py` Closes pandas-dev#15337. xref pandas-dev#15895. Author: gfyoung <[email protected]> Closes pandas-dev#15894 from gfyoung/validate-file-like and squashes the following commits: 5a8f8da [gfyoung] DOC: Document all of inference.py 81103f7 [gfyoung] ENH: Add file buffer validation to I/O ops
- [x] closes pandas-dev#14855 - [x] tests passed - [x] passes ``git diff upstream/master | flake8 --diff`` Author: alexandercbooth <[email protected]> This patch had conflicts when merged, resolved by Committer: Tom Augspurger <[email protected]> Closes pandas-dev#14871 from alexandercbooth/fix-color-scatterm-bug and squashes the following commits: 3245f09 [alexandercbooth] DOC: moving whatsnew entry to 0.20.0 8ff5f51 [alexandercbooth] BUG: addresses pandas-dev#14855 by fixing color kwarg conflict
* DOC: Fix a typo in indexing.rst * more typos fixed
closes pandas-dev#15297 Author: Roger Thomas <[email protected]> Closes pandas-dev#15299 from RogerThomas/fix_nsmallest_nlargest_with_n_identical_values and squashes the following commits: d3964f8 [Roger Thomas] Fix nsmallest/nlargest With Identical Values
* CLN: clean up select_n algos * CLN: clean ensure_data closes pandas-dev#15903 * return ndtype, so can eliminate special cases * unique * fixups
rebased |
when you rebase you should end up with a small number of commits. try
|
closing. this needs a new PR with a cherry-pick on top of master. |
git diff upstream/master | flake8 --diff